import pandas as pd
import altair as alt
# Data load-in
data = {
"values": [
{"Date": "Q1-2017", "Completion Rate": 0.91, "Response Rate": 0.023},
{"Date": "Q2-2017", "Completion Rate": 0.93, "Response Rate": 0.018},
{"Date": "Q3-2017", "Completion Rate": 0.91, "Response Rate": 0.028},
{"Date": "Q4-2017", "Completion Rate": 0.89, "Response Rate": 0.023},
{"Date": "Q1-2018", "Completion Rate": 0.84, "Response Rate": 0.034},
{"Date": "Q2-2018", "Completion Rate": 0.88, "Response Rate": 0.027},
{"Date": "Q3-2018", "Completion Rate": 0.91, "Response Rate": 0.026},
{"Date": "Q4-2018", "Completion Rate": 0.87, "Response Rate": 0.039},
{"Date": "Q1-2019", "Completion Rate": 0.83, "Response Rate": 0.028}
]
}
df = pd.DataFrame(data["values"])
Order = ["Q1-2017","Q2-2017","Q3-2017","Q4-2017","Q1-2018","Q2-2018","Q3-2018","Q4-2018","Q1-2019"]
# Completion Rate Chart
Completion_chart = alt.Chart(df, width=400, height=300).mark_line(color="blue").encode(
x=alt.X('Date:N', title="Date (measured quarterly)", sort=Order),
y=alt.Y('Completion Rate:Q', title="Completion Rate", axis=alt.Axis(titleColor="blue"))
)
# point
Completion_chart_point = alt.Chart(df, width=400, height=300).mark_point(color="blue",size = 20).encode(
x=alt.X('Date:N', title="Date (measured quarterly)", sort=Order),
y=alt.Y('Completion Rate:Q')
)
# Text
Text_completion = alt.Chart(df[df['Date'].isin(["Q1-2017", "Q1-2019"])]).mark_text(align='center', dy=10, color='blue').encode(
x=alt.X('Date:N', sort=Order),
y=alt.Y('Completion Rate:Q'),
text=alt.Text("Completion Rate:Q", format=".2f")
)
Completion_chart_text = alt.layer(Completion_chart, Text_completion,Completion_chart_point)
# Response Rate Chart
Response_chart = alt.Chart(df, width=400, height=300).mark_line(color="orange").encode(
x=alt.X('Date:N', title="Date (measured quarterly)", sort=Order),
y=alt.Y('Response Rate:Q', title="Response Rate", axis=alt.Axis(titleColor="orange"))
)
# point
Response_chart_point = alt.Chart(df, width=400, height=300).mark_point(color="orange", size = 20).encode(
x=alt.X('Date:N', title="Date (measured quarterly)", sort=Order),
y=alt.Y('Response Rate:Q')
)
# Response Rate Text
Text_response = alt.Chart(df[df['Date'].isin(["Q1-2017", "Q1-2019"])]).mark_text(align='center', dy=10, color='orange').encode(
x=alt.X('Date:N', sort=Order),
y=alt.Y('Response Rate:Q'),
text=alt.Text("Response Rate:Q", format=".3f")
)
Response_chart_text = alt.layer(Response_chart, Text_response, Response_chart_point)
# Combine the two charts side by side with independent y-axes
Total_chart = alt.layer(Completion_chart_text, Response_chart_text).resolve_scale(y="independent").properties(
title=alt.TitleParams(
text="Completion and Response Rates of Survey from 2017 to 2019",
anchor="middle",
fontSize=16
)
).configure_axisX(
labelAngle=0
)
Total_chartData 304 Portfolio
Homework 6
Exercise 1
a.What is the most interesting lesson, guide, or piece of advice Tufte offers you in this chapter?
On page 37, Tufte wrote: “The problem with time-series is that the simple passage of time is not a good explanatory variable: descriptive chronology is not a causal explanation”.
This is interesting because we sometimes link the correlation in the time-series graphic with some external reasons and explainations. In fact, we cannot make such conclusion. Nonetheless, it gives us an idea of the trend and prompt further explore.
b.Tufte shares some of his favorite graphics in this chapter. Pick one (but not the one about the military advance on and retreat from Russia) and answer the following.
- What page is your graphic on? [Take a screen shot and include the image as well, if you can.] Page 48.
- Why did you pick the graphic you chose?
This is an interesting trail graph, and I like how it challenges the assumption that unemployment rate and inflation are inversely related.
- What encoding channels are used in the graphic? What variables are they associated with?
Position (X axis): male unemployment rate.
Position (Y axis): Increase in CPI.
Line (connect points): time.
Text: Guides for year.
- What, if any, elements of the graphic would be hard/impossible for you to implement in Vega-Lite (given what we know so far)?
I think everything in this graph would be possible to implement in Vega-Lite.
- What point is Tufte illustrating with this graphic?
It is an example of relational graphic that links two variables, encouraging and imploring the view to assess the possible causal relationship between the variables. This graphic in particular confronts the commonly held belief that inflation and unemployment rate are inversely related to each other.
Exercise 2
List one or two ideas that you learned in these sections that will change the way you design and create data graphics.
- On page 30, Tufte wrote about the power of data graphics should be reserved for the richer, more complex, more difficult statistical material. If the data is simple, it can be better summarized in one or two numbers, then there’s no need for data graphics. I changed my understanding in designing graphics. I didn’t care about the efficiency before; I divide the data and make multiple graphics and let viewers sort through the graphics. Tufte suggests that data graphics has the power to convey complexity of data with visualization, and thus data graphics exists not only because it visualize the data, but also because it is more efficient than data tables.
Exercise 3
Exercise 2.13 from book:
Step 1: List three things that are not ideal about this graph. - The guide is unclear. We don’t know what the response and completion rates means. In addition, the X axis is unclear. - Completion rates are bars and response rate are lines. It suggests that completion are treated as a discrete data, while response rate is a continuous variable. - The text and the y axis guides are redundent. In addition, the orange texts such as “2.3%” overlap a little with the blue bars and sometimes with the orange lines.
Step 2: For each, describe how you would overcome the given challenges - I would change the titles from “Response and Completion Rates” to “Response and Completion Rates of ____ from 2017 to 2019”. Add title to X-axis: “Time (measured Quaterly)” - I would change the bars of completion rate into lines. - I would only show the beginning and ending values, leaving the rest for the y axis guide.
Step 3:
a. Graphic:
b. Identify some ways in which your design was affected by the things you read or the examples you saw in this assignment. Tufte talked about how time-series graphics are great for richer, more complex, more difficult data. He stressed on the efficiency of data graphics. Therefore, I chose to layer the graphics together, instead of concatenation, so that more data can be visualized efficiently in a smaller space. In addition, some graphics Tufte showed have two y axis, one on each side (example: page 15). It shows that it’s okay to have two different y axis as long as they are accurately presented.
Homework 5
Exercise 1
What marks are being used? What variables are mapped to which properties?
Points, lines, texts
X: time (year)
Y (pink graphic): Total federal employment;
Y (blue graphic): Percent of all US jobs
What is the main story of this graphic?
Both the total number of federal employees and federal employment as a percent of all US jobs sharply increased and decreased in 1940s, peaked at November of 1944 at 3.14 Million total federal employments and consisted of 7.45% of all US jobs.
Since 1950s, the total number of federal employees has slighly increased until 1990s, and remained relatively stable since 1990s, with some fluctuations.
Despite this stability and slight increase of total number of federal employees, federal employment as a percent of all US jobs has steadily declined from 1950s to 2024 with small fluctuations.
What makes it a good graphic?
It has clear informative labels of x axis, y axis, and key events and points in the graphics.
The colors are distinct to each other and consistant for each graphic
What features do you think you would know how to implement in Vega-Lite?
I know how to do: Titles, line graphs, points, texts, color on the graphics. I can figure out how to have two graphics with the same x axis but different y axis with concatanation or repeat.
Are there any features of the graphic that you would not know how to do in Vega-Lite? If so, list them.
I don’t know how to implement the word blob and have it point to a specific point on the graphic
I don’t know how to add the brand icon (the “USAFACTS” on the bottom left of the graphic).
Exercise 2
- Create a graphic that shows the high temperature in Seattle each day.
'
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"width": 600,
"height": 250,
"title": {
"text": "Weather",
"font": "serif",
"fontSize": 40,
"fontStyle": "italic",
"color": "royalblue"
},
"params":[
{
"name": "year",
"value": "2012",
"bind": {"input": "select", "options": ["2011", "2012","2013", "2014", "2015"]}
}
],
"transform": [
{
"calculate": "year(datum.date)",
"as": "year"
},
{"filter":"datum.year == year"}
],
"mark":"line",
"encoding": {
"y": {
"field": "temp_max",
"type": "quantitative",
"title": "High Temperature (°C)"
},
"x": {
"timeUnit": "monthdate",
"field": "date",
"type": "temporal",
"scale": {"zero": false},
"title": "Day of Year",
"axis": {
"format": "%b %d",
"labelAngle": -45,
"titleFontSize": 14
}
}
}
}
'|> as_vegaspec()- Now modify this so that the temperatures for the same day of the year are overlaid on top of each other for the several years in the data set.
'{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"url": "https://calvin-data304.netlify.app/data/weather-with-dates.csv"
},
"height": 200,
"width": 100,
"columns": 1,
"title": {
"text": "Weather",
"font": "serif",
"fontSize": 40,
"fontStyle": "italic",
"color": "royalblue"
},
"transform": [
{
"calculate": "year(datum.date)",
"as": "year"
}
],
"facet": {
"field": "month",
"type": "nominal",
"sort":["1","2"]
},
"spec":{
"mark":"point",
"encoding": {
"y": {
"field": "temp_max",
"type": "quantitative",
"title": "High Temperature (°C)",
"scale":{"unionWith":[0,20]}
},
"x": {
"field": "day",
"type": "ordinal",
"scale": {"zero": false},
"title": "Day of month",
"sort": ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31"]
},
"color":{"field": "year",
"type": "nominal"},
"fill": {
"field": "year",
"type": "nominal",
"title": "Year",
"legend": {"orient": "right"}
}
}
}
}'|> as_vegaspec()- Create a graphic that shows how the different types of weather (rain, fog, etc.) are distributed by month in Seattle. When is it rainiest in Seattle? Sunniest?
import pandas as pd
import altair as alt
from vega_datasets import datadf = pd.read_csv("https://calvin-data304.netlify.app/data/weather-with-dates.csv")
df['month_name'] = pd.to_datetime(df['date']).dt.month_name()
month_order = [
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
]
# make the chart
alt.Chart(df,width = 200, height = 150).mark_bar().encode(
x = alt.X('month_name:N',title = "Month", sort = month_order),
y = 'count()',
color = alt.Color("weather:N", title="Weather Type")).facet('weather:N', columns=3)